Model specification and prior predictive simulation for ONS smoking and COVID-19 study

true , true , true , true , true , true
2021-05-17

Selecting the number of knots for cubic spline terms

Time in the pandemic will be entered as cubic spline terms with a) 8 knots and b) 11 knots (i.e. months in the pandemic).

The selection of a) was based on the visual inspection of the dynamics of positive tests in the UK (i.e. date of specimen collection), using data from the UK Government COVID-19 Dashboard:

Figure 1: UK COVID-19 case data by specimen date from 2020-04-26 to 2021-01-31.

Inflection points (or knots) were selected through visual inspection. Time points at which cases were rising (or falling) were grouped into segments, with the transition from one segment to the next representing a knot, resulting in a total of 8 knots. Linear lines of best fit were drawn between the knots to illustrate (see Figure 2).

Segments used to select the number of knots.

Figure 2: Segments used to select the number of knots.

Prior predictive simulations for RQs 1-4

RQ1 - current vs. never smoking

Our living review found that the relative risk of SARS-CoV-2 infection in current smokers was 29% lower than in never smokers (RR = 0.71, CrI = 0.61, 0.82). Therefore, we will use an informative prior for current smokers with the mean set to the logarithm of 0.71 and the standard deviation set to the logarithm of the upper CrI minus the logarithm of the lower CrI divided by 3.92, i.e. normal(-0.34, 0.08). As priors are defined on the linear scale, we will use the logarithm of the relative risk. We will further use a weakly informative prior for never smokers. We will specify the distribution as normal(0, 2.5), which rules out implausibly large effects and the centering on zero means that negative and positive values are equally plausible (see Figure 3).

rq1_mean <- log(0.71)
rq1_sd <- (log(0.82)-log(0.61))/3.92

rq1_csmok <- inv.logit(rnorm(10000,mean=rq1_mean, sd=rq1_sd))

rq1_nsmok <- inv.logit(rnorm(10000,mean=0, sd=0.25))

dat_rq1 <- tibble(proportion = c(rq1_csmok, rq1_nsmok),
              smok_status = c(rep("Current smokers", 10000),
                              rep("Never smokers", 10000)))

ggplot(dat_rq1) +
  geom_histogram(aes(x = proportion)) +
  facet_wrap(~ smok_status) +
  theme_minimal() +
  theme(panel.border=element_rect(colour = "black", size = 1, fill = NA)) +
  labs(y = element_blank(),
      x = "Proportion testing positive for SARS-CoV-2")
Histogram of the prior predictions for the proportion of current and never smokers testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

Figure 3: Histogram of the prior predictions for the proportion of current and never smokers testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

RQ2 - former vs. never smoking

Our living review has shown that the relative risk of SARS-CoV-2 infection in former smokers was 3% greater than in never smokers (RR = 1.03, CrI = 0.95, 1.11). We will use an informative prior with the mean set to the logarithm of 1.03 and the standard deviation set to the logarithm of the upper CrI minus the logarithm of the lower CrI divided by 3.92, i.e. normal(0.03, 0.04). We will further use a weakly informative prior for never smokers. We will specify the distribution as normal(0, 2.5) (see Figure 4).

rq2_mean <- log(1.03)
rq2_sd <- (log(1.11)-log(0.95))/3.92

rq2_fsmok <- inv.logit(rnorm(10000,mean=rq2_mean, sd=rq2_sd))

rq2_nsmok <- inv.logit(rnorm(10000,mean=0, sd=0.25))

dat_rq2 <- tibble(proportion = c(rq2_fsmok, rq2_nsmok),
                  smok_status = c(rep("Former smokers", 10000),
                                  rep("Never smokers", 10000)))

ggplot(dat_rq2) +
  geom_histogram(aes(x = proportion)) +
  facet_wrap(~ smok_status) +
  theme_minimal() +
  theme(panel.border=element_rect(colour = "black", size = 1, fill = NA)) +
  labs(y = element_blank(),
       x = "Proportion testing positive for SARS-CoV-2")
Histogram of the prior predictions for the proportion of former and never smokers testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

Figure 4: Histogram of the prior predictions for the proportion of former and never smokers testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

RQ3 - nicotine users who smoke tobacco vs. nicotine users who do not smoke tobacco

Evidence on the risk of SARS-CoV-2 infection in nicotine users who smoke tobacco (vs. not) is lacking. For this comparison, we will therefore use the same informative prior as above for current smokers. The same prior will be used for nicotine users who smoke tobacco and nicotine users who do not smoke tobacco (see Figure 5).

rq3_mean <- log(0.71)
rq3_sd <- (log(0.82)-log(0.61))/3.92

rq3_nicsmok <- inv.logit(rnorm(10000,mean=rq3_mean, sd=rq3_sd))

rq3_nic <- inv.logit(rnorm(10000,mean=rq3_mean, sd=rq3_sd))

dat_rq3 <- tibble(proportion = c(rq3_nicsmok, rq3_nic),
                  smok_status = c(rep("Nicotine users who smoke", 10000),
                                  rep("Nicotine users", 10000)))

ggplot(dat_rq3) +
  geom_histogram(aes(x = proportion)) +
  facet_wrap(~ smok_status) +
  theme_minimal() +
  theme(panel.border=element_rect(colour = "black", size = 1, fill = NA)) +
  labs(y = element_blank(),
       x = "Proportion testing positive for SARS-CoV-2")
Histogram of the prior predictions for the proportion of nicotine users who smoke and nicotine users who do not smoke testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

Figure 5: Histogram of the prior predictions for the proportion of nicotine users who smoke and nicotine users who do not smoke testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

RQ4 - nicotine use vs. no nicotine use among people who do not smoke tobacco

For the comparison of nicotine use vs. no nicotine use among people who do not currently smoke, we will use the same informative prior as for current smokers for the nicotine users, reflecting the hypothesis that the ‘protective’ effect observed in current smokers is conferred by active nicotine use (rather than tobacco). For former and never smokers who do not use nicotine, we will use an informative prior with the mean set to the midpoint of the relative risk for never smokers (i.e. RR = 1.0) and former smokers (i.e. RR = 1.03), i.e. RR = 1.015. As there is a lack of evidence for this comparison, a larger standard deviation will be specified (see Figure 6).

rq4a_mean <- log(0.71)
rq4a_sd <- (log(0.82)-log(0.61))/3.92

rq4b_mean <- log(1.015)

rq4_nic <- inv.logit(rnorm(10000,mean=rq4a_mean, sd=rq4a_sd))

rq4_nonic <- inv.logit(rnorm(10000,mean=rq4b_mean, sd=0.25))

dat_rq4 <- tibble(proportion = c(rq4_nic, rq4_nonic),
                  smok_status = c(rep("Nicotine use", 10000),
                                  rep("No nicotine use", 10000)))

ggplot(dat_rq4) +
  geom_histogram(aes(x = proportion)) +
  facet_wrap(~ smok_status) +
  theme_minimal() +
  theme(panel.border=element_rect(colour = "black", size = 1, fill = NA)) +
  labs(y = element_blank(),
       x = "Proportion testing positive for SARS-CoV-2")
Histogram of the prior predictions for the proportion of nicotine users and those who do not use nicotine among people who do not smoke tobacco testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

Figure 6: Histogram of the prior predictions for the proportion of nicotine users and those who do not use nicotine among people who do not smoke tobacco testing positive for SARS-CoV-2 infection, simulated from 10,000 samples with parameters on the logit scale. The inverse logit function was applied to obtain probabilities.

Covariates

For covariates, weakly informative priors will be used. We will specify a normal distribution with a mean of 0 and standard deviation of 2.5. Continuous normally distributed variables will be standardised. As we cannot be too sure about the intercept (Laurent Smeets and Rens van de Schoot 2019), we will specify a Cauchy distribution with a shape parameter of 10 (see Figure 7).

ggplot(data.frame(x = c(-5, 5)), aes(x)) +
  stat_function(fun = dcauchy, n = 1e4, args = list(location = 0, scale = 10), aes(color = "a"), size = 2) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0.05)) +
  scale_color_discrete(name = "",
                       labels = c("a" = expression(x[0] == 0*","~ gamma == 10))) +
  ylab("") +
  theme_bw(base_size = 24) +
  theme(legend.position = c(0.8, 0.8),
        legend.text.align = 0)
Illustration of Cauchy distribution.

Figure 7: Illustration of Cauchy distribution.

Laurent Smeets, and Rens van de Schoot. 2019. “WAMBS BRMS Tutorial: Popularity Data.” Rens van de Schoot. https://www.rensvandeschoot.com/brms-wambs/.

References